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ABSTRACT 

This  study  investigated  programming  activity  in  COBOL. 

Attempts  were  made  to  identify  problem  areas  so  that  improve¬ 
ments  can  be  made  in  COBOL  compilers  and  in  the  mannr,-  in 
which  COBOL  is  taught  Identification  of  problem  areas  was 
achieved  through  examining  program  changes  made  by  student 
programmers  during  the  development  of  four  difTcrent  COBOL  pro¬ 
grams.  The  data,  which  was  collected  from  a  COBOL  course  at  Pur¬ 
due  University,  consisted  of  all  versions  of  all  programs  submitted 
for  compilation  by  each  student.  Thus,  the  data  represented  a 
complete  history  of  each  subject’s  program  development  process 
beginning  with  the  initial  version  compiled  and  ending  with  the' 
final  version  submitted  for  grading.  All  program  changes  made 
between  two  successive  versions  were  classified  into  four 
categories:  COBOL-related,  algorithmic,  cosmetic  and  report- 
generation-related.  This  classification  scheme  indicates  that  a 
significant  number  of  changes  arc  related  to  report  generation 
which  suggests  a  need  for  support  in  this  area.  Secondly,  all 
COBOL-related  changes  were  delineated  into  104  err  or  categories. 

This  delineation  suggests  that  there  are  several  problem  areas  in 
COBOL.  Finally,  the  four  categories  of  program  changes  were 
observed  with  respect  to  various  points  in  the  program  develop¬ 
ment  process.  Most  COBOL-related  changes  occur  before  the  mid¬ 
point  of  the  program  development  process  whereas  most  cosmetic 
changes  occur  late  in  the  process. 

Keywords  and  Phrases:  error-pronctiess,  COBOL,  programming  languages, 
language  features 

1.  Introduction 

There  is  no  doubt  that  COBOL  is  an  important  programming  language.  A r: 
part  of  the  early  triumvirate  (with  FOKTKAN  and  ALGOL)  COBOL  is  still  important 
in  business  school  programs  and  is  the  most  widespread  and  intensively-used 
language  in  application  programming  [Phil73,Lcmo79]  In  industry,  it  has 
weathered  the  storm  of  PL/1  and  even  seems  to  be  holding  on  in  u  world  rapidly 
filling  with  PASCAL-trained  programmers.  There  are  those  who  believe  that  ADA 


will  ultimately  make  COBOL  obsolete  but  the  slow  pace  of  ADA’s  introduction 
suggests  that  if  such  occurs,  it  will  be  far  in  the  future.  Thus,  for  the  present  it 
appears  that  "real-world"  programmers  will  continue  to  construct  "real-world” 

progr  ams  in  COBOL. 

Despite  COBOL's  widespread  use.  it  sufFers  from  "human  engineering 
problems"  fNels?2|  The  language  has  some  features  that  are  difficult  to  use 
sail  Iv  (v  g.  the  CORRESPONDING  option)  Although  COBOL  has  been  touted  as 
"easily-readable".  few  have  ever  claimed  that  it  is  "easily-writable". 

Furthermore,  COBOL  has  received  little  academic  attention  [Samm78].  Little 
research  has  been  done  to  attempt  to  identify  problems  with  this  language. 

At  Purdue  University,  we  have  been  conducting  a  research  project 
investigating  COBOL  We  are  interested  in  those  features  of  the  language  that 
may  be  troublesome  for  programmers  Our  goal  has  been  to  identify  such 
features  so  that  (1)  they  might  be  emphasized  when  teaching  COBOL,  (2)  existing 
compilers  might  be  altered  to  provide  better  diagnostics,  and  (3)  ultimately 
some  language  features  might  be  changed  to  make  them  more  usable.  In  the 
following  sections  of  tins  paper,  we  describe  a  previous  study  of  COBOL,  report 
the  methodology  we  employed,  and  discuss  our  results. 

2.  Previous  Research 

Three  have  been  attempts  to  investigate  those  constructs  in  ALGOL,  BASIC, 
COBOL.  FORTRAN  and  PL/  1  which  arc  difficult  to  uao.  Youngs  [Youn74]  analyzed 
r.u  programs  wr  itten  in  these  languages  and  delineated  errors  into  B 
runrl  nm. illy  defined  categories  :  allocation,  assignment,  iteration,  I/O 
I'm  mailing,  other  I/O.  parameter /subscript  list,  conditionals  and  vertical 
delimiter.  He  found  that  these  categories  accounted  for  approximately  B3 
percent  uf  all  errors  committed. 

Youngs’  study  suggests  that  COBOL  sufTcrs  in  terms  of  allocation  for  two 
reasons.  The  allocation  of  space  (for  identifiers,  tables,  etc.)  in  COBOL  is 
complex.  Consider  the  following  declarations  for  a  5  X  10  array  in  both  FORTRAN 

and  COBOL: 


FORTRAN 

DIMENSION  ITEMS  (5,10) 


COUOL 

01  JTEMS-TABLE. 

05  1TEMS-1  OCCURS  5  TIMES. 

10  ITEMS-2  PICTURE  999  OCCURS  10  TIMES. 


Note  that  although  the  syntax  used  in  COBOL  to  allocate  space  for  tables  is 
relatively  complex,  it  provides  a  greater  degree  of  flexibility.  For  example,  one 
can  access  an  entire  "row"  of  the  table  declared  above  in  COBOL  using  JTEMS-1 
{!)  for*  /=]  ,2, ...,5.  Secondly,  COBOL  suffers  in  terms  of  allocation  because  there 
i  ;  -i  U<  I,  of  complete  implicit  and  default  specifications.  For  example,  in  the 

FORTRAN  declaration,  ITEMS  is  implicitly  declared  to  be  an  array  of  type 
inUv'cr  COBOL  does  not  provide  such  an  implicit  type  specification. 

Another  study  conducted  by  Litecky  and  Davis  studied  errors  and  error- 
proneness  in  COBOL  [Litc76].  "Error-pronencss"  is  defined  as  the  error 
frequency  for  a  particular  language  element  divided  by  the  number  of  usages  of 
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that  element.  Errors  from  1,400  runs  from  73  students  in  a  beginning  COBOL 
course  were  classified  according  to  a  scheme  established  from  a  pilot  study. 
The  hierarchical  classification  scheme  distinguished  132  types  of  errors.  The 
highest  level  consisted  of  32  major  error  classes  such  as  hyphenation  and 
punctuation.  A  relatively  high  frequency  was  found  for  many  different  types  of 
COBOL  errors.  For  example,  a  missing  period  and  a  misspelled  structural  word 
accounted  for  8.6  and  2.6  percent  of  all  COBOL  errors  respectively  However, 
only  four  error  types  were  declared  to  be  error-prone: 


[1]  Period  added  after  the  file  name  specified  in  a  file  description  (I’D)  For 
example 


FD  INPUT-FILE-NAME. 

LABEL  RECORDS  ARE  STANDARD. 


The  period  inserted  after  "INPUT-FILE-NAME"  is  syntactically  incorrect 


[2]  The  use  of  commas  as  word  delimiters.  The  following  is  an  example  of  a 
comma  used  to  delimit  the  identifiers  B  and  C 


ADD  A  TO  B.C 


The  proper  delimiter  is  a  space  rather  than  a  comma. 


[3]  A  missing  period  after  a  record  name  at  a  level  01.  For  example 


FD  INPUT-FILE 

LABEL  RECORDS  ARE  STANDARD. 
01  INPUT-RECORD 

05  SOME-FIELD  PIC  99. 


COBOL  syntax  requires  a  period  after  the  group  level  item  ’’INPUT-RECORD". 


[4]  Operand(s)  of  an  arithmetic  statement  are  not  computational  in  nature. 
For  example,  the  arithmetic  statement  ADD  A  TO  B  is  invalid  in  the  context 


05  A 
05  U 


PIC  999. 
PIC  ZZ9. 
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sinrc  B  is  alphanumeric. 

LiUcky  and  Davis  also  studied  the  content  of  specific  high-frequency  errors 
and  the  accuracy  of  compiler-generated  error  diagnostics.  They  found  that  BO 
percent  of  the  spelling  errors  in  COBOL  could  be  classified  into  only  4  error 
Hasses  and  therefore  could  be  corrected  by  existing  algorithms.  The  4  error 

classes  are 


[  1 J  One  letter  wrong 

[2]  One  letter  missing 

[3]  An  extra  character  inserted 

[4 J  Two  adjacent  characters  transposed 


The  diagnosis  of  COBOL  errors  by  the  compiler  (Control  Data  Corporation 
COBOL  compiler  for  the  6600)  was  compared  with  the  diagnosis  of  a  "conversant" 
human  judge  The  major  finding  was  that  less  than  one  in  five  errors  were 
accurately  diagnosed  by  the  compiler. 

The  idea  behind  their  research  is  good  but  we  believe  that  their  study  has 
three  major  shortcomings 

[  1  ]  The  COBOL  errors  identified  are  very  low-level.  For  example,  errors  such  as 
a  missing  hyphen  in  a  FILE-CONTKOL  clause  are  very  elementary  relative  to 
those  errors  that  will  cause  problems  for  experienced  programmers  using 
advanced  features  Thus,  COBOL  errors  which  most  likely  occur  at  the 
professional  level  have  not  been  adequately  identified. 

[2J  The  behavior  of  high-frequency  errors  and  error-proneness  has  not  been 
observed  over  time.  Thus,  some  error  types  that  are  claimed  to  be  error- 
prone  mo  not  be  a  problem  as  programmers  become  more  experienced  in 
COBOL.  For  example,  in  our  study  we  found  that  the  frequency  for  the  error 
type  'period  added  after  FD  filename"  decreases  quickly. 

( 3 J  Only  one  compiler  was  considered  in  the  study  of  error  diagnosis  accuracy. 
Therefore  any  results  pertaining  to  error  diagnosis  accuracy  cannot  be 

generalized. 

3.  Procedure 

Our  research  attempts  to  identify  problem  areas  in  COBOL  by  studying 
program  changes  made  by  programmers  who  developed  several  different 
programs  A  program  change  is  defined  as  a  textual  change  between  successive 
versions  of  a  program  [DunsBO]  Each  of  the  following  textual  changes  to  a 
program  represents  one  program  change: 

One  or  more  changes  to  a  single  statement.  Even  multiple  character 
changes  to  a  statement  represent  mental  activity  with  only  a  single 
abstract  instruction. 

One  or*  more  statements  inserted  between  existing  statements.  The 
contiguous  group  of  statements  inserted  probably  corresponds  to  the 
concrete  statements  that  represent  a  single  abstract  instruction. 
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A  change  to  a  single  statement  followed  by  the  insertion  of  new  statements 

The  following  textual  changes  to  a  program  are  not  counted  as  program 
changes: 

The  deletion  of  one  or  more  statements.  Deleted  statements  must  usually 
be  replaced  by  other  statements  elsewhere.  The  inserted  statements  are 
counted.  Counting  deletions  as  well  would  give  double  weight  to  such  a 
change. 

The  insertion  of  standard  output  statements.  These  are  occasionally 
inserted  in  a  "wholesale"  fashion  during  debugging. 

Examining  program  changes  for  several  different  programs  developed  by  the 
same  set  of  subjects  enabled  us  to  observe  the  frequency  of  various  error  types 
with  respect  to  time. 

Our  research  involved  three  major  areas: 

[1]  All  program  changes  were  classified  as  algorithmic ,  COBOL-related, 

cosmetic ,  or  repor t-generat ion-re l ated.  Algorithmic  program  changes  are 
those  needed  to  correctly  implement  an  algorithm.  For  example,  changing 

IF  KEY  =  DEPT-NO 
to 

IF  KEY  a  DEPT-NO  AND  NOT  =  PHEV-DEPT-NO 

is  considered  an  algorithmic  change  since  the  original  statement  is 
syntactically  correct.  The  change  is  made  to  correctly  implement  the 
chosen  algorithm.  COBOLrrelated  changes  arc  those  necessary  due  to 
restrictions  imposed  by  COBOL.  For  example,  a  missing  hyphen  in  a 
keyword  (e  g.  LINE-COUNTER)  necessitates  a  COBOL-related  change. 
Cosmefic  changes  include  the  insertion  of  blank  lines  and  comments  as  well 
as  reformatting  without  alteration  of  existing  statements.  Report - 
generation-related  changes  include  those  changes  necessary  to  generate  a 
report.  Such  changes  often  involve  maintaining  page  numbers, 
manipulating  carriage  control  and  determining  page  breaks  (The  Report 
Writer  feature  was  not  used  in  any  of  the  programming  assignments).  Some 
program  changes  can  be  placed  into  two  categories.  For  example,  changing 

IF  LINE-COUNTER  >  55 


to 


IF  UNE-KOUNTER  >  60 

i9  considered  to  be  both  a  COBOL-related  and  rcporl-gcncraliori-rcJulcd 
change;  COBOL-related  because  LINE-COUNTER  is  a  COBOL  reserved  word 
and  report-related  because  60  lines  arc  now  desired  rather  than  55  As  an 
example  of  the  intersection  between  the  categories  of  algorithmic  and 
COBOL-related,  consider  changing 


IF  EMP-NO  <>  PREV-EMP-NO 


w 
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to 

IF  EMP-NO  NOT  =  PREV-EMP-NO  AND  OLD-EMP 


This  change  is  considered  algorithmic  because  an  additional  condition  must 
bo  satisfied  and  is  considered  COBOL-related  because  "<>”  cannot  be  used 
to  denote  inequality  in  COBOL. 


[2]  All  COBOL-related  program  changes  were  further  delineated  into  104  error 

categories  such  as  editing,  literals,  punctuation  etc.. 

[ 3 J  Each  of  the  four  categories  of  program  changes  were  examined  with  respect 

to  when  they  occur  in  the  program  development  process. 

The  data  for  our  research  was  obtained  from  students  in  an  upper-level 
COBOL  course  at  Purdue  University  in  the  summer  of  1980.  The  students,  who 
had  some  experience  programming  in  FORTRAN  or  PASCAL,  were  required  to 
write  five  programs  (COBOL1,  C0B0L2,  ....  COBOL5)  as  part  of  the  course 
requirements.  The  first  program,  COBOL1,  was  disregarded  for  our  purposes 
because  it  did  not  demand  significant  programming  effort  and  represented  most 
students*  initial  experience  with  COBOL.  The  second  program,  C0B0L2,  involved 
writing  a  file  in  readable  form.  The  last  three  programs,  which  were 
approximately  700-000  lines  of  code  each,  involved  master  file  updating.  C0B0L5 
required  changing  CODOL4  to  include  random  access.  Some  COBOL  features  that 
would  most  likely  appear  at  the  professional  level,  namely  sorting  and  random 
access,  were  employed  in  C0B0L4  and  C0B0L5  only.  Thus,  we  could  not  observe 
how  the  frequency  of  program  changes  in  these  categories  behave  over  time. 

All  versions  of  a  particular  program  submitted  for  compilation  were 
captured  for  each  programmer.  The  average  number  of  versions  submitted  per 
programmer  ranged  from  12  for  C0B0L2  to  53  for  C0B0L4.  Instead  of  examining 
all  versions  from  approximately  40  programmers  for  each  programming 
assignment,  a  random  sample  of  10  programmers  was  used.  A  sample  size  of  10 
seemed  to  be  appropriate  since  2  random  samples,  of  10  programmers  each, 
yielded  similar  results  for  C0B0L2  For  each  of  these  sample  groups,  Table  1 
shows  the  frequency  of  changes  for  each  category  and  the  percentage  that 
frequency  is  of  the  total  number  of  changes. 


COBOL 

Algorithmic 

Cosmetic 

f 

% 

f 

% 

f 

% 

f 

% 

Croup  1  j 

176 

23.9 

109 

25.9 

193 

20.7 

371 

50.4 

Group  2  i 

151 

20.  B 

1B4 

12.3 

407 

58.2 

Table  1 


To  examine  program  changes  between  two  successive  versions,  a  system 
utility  called  "b'RCCOM"  was  used.  SKCCOM  provided  a  file  of  all  textual  changes 
between  two  versions.  This  file  was  then  examined  manually. 
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4.  Results 

Our  results  correspond  to  the  three  major  areas  involved  in  our  research 
For  each  programming  assignment,  Table  2  shows  the  frequency  of  changes  for 
each  category  and  the  percentage  that  frequency  is  of  the  total  number  of 
changes  for  that  assignment.  The  sum  of  the  percentages  is  greater  than  *00% 
because  of  the  overlap  discussed  earlier  Note  that  algorithmic  changes  account 
for  only  25  percent  of  the  changes  for  C0U0L2  but  account  for  over  GO  pmcml 
of  all  changes  for  the  last  three  assignments  Half  of  all  changes  on  (.’01101,2  are 
report-generation-related  but  for  C0B0L3,  4.  and  5  these  remain  relatively 
stable  constituting  approximately  25  percent  of  all  changes.  Finally,  notice  that 
those  changes  necessitated  by  problems  with  COBOL  remain  relatively  stable  at 
about  20  percent.  That  is,  one  of  every  five  changes  is  due,  at  least  in  part,  to 
problems  with  the  programming  language. 


CO 

BOL 

Cosmetic 

Report  Generation 

f 

% 

f 

% 

f 

% 

f 

U/ 

/. 

C0B0L2 

176 

23.9 

169 

25.9 

193 

20  7 

371 

50 

C0D0L3 

K31 

BO 

67.0 

177 

1G.3 

203 

22.3 

'  CUB0L4 

199 

19.1 

67  3 

21  1 

16.0 

276 

20.5  j 

21.4 

60.1 

106 

39.1 

44 

L  . 30.1 

Table  2 


Appendix  1  represents  the  delineation  of  COBOL-related  program  changes 
into  the  104  error  categories.  For  each  error  category,  Appendix  1  shows  the 
frequency  of  changes  made  due  to  this  type  of  error  for  each  programming 
assignment.  A  blank  entry  in  Appendix  1  indicates  that  the  COBOL  feature  for 
the  error  category  in  question  was  not  employed  in  this  programming 
assignment.  For  example,  only  C0B0L5  required  random  access  and  therefore 
there  are  blank  entries  for  this  category  for  C0B0L2,  COBOL3  and  COMO  1.4. 

For  programming  assignments  COBOL2-COBOI.5,  Figure  }  shows  that 
COBOL-related  changes  are  typically  made  early  in  the  program  development 
process  whereas  Figure  2  shows  that  cosmetic  changes  are  more  frequent  at  the 
end.  Figures  3  and  4  show  that  algorithmic  and  report-generation-  related 
changes  occur  throughout  the  development  process. 

5.  Discussion 

As  indicated  in  Table  2,  there  is  a  significant  number  of  report-gencratio n- 
relatcd  changes  for  each  programming  assignment.  This  suggests  that 
programmers  could  most  likely  use  some  support  in  generating  reporlr,  One 
type  of  support  already  being  used  (not  in  this  study)  is  the  Report  Writer 
feature.  A  study  has  shown  that  programmers  find  Report  Writer  makes  the 
maintenance  and  generation  of  reports  much  easier  |  AudclM  ).  I  lowever ,  out 
research  docs  not  suggest  that  Report  Writer  is  a  panacea,  primarily  because 
some  changes  involved  features  that  exist  even  in  Report  Writer  For  example, 
changes  which  involved  editing  were  considered  ropor  l-gencration-reluted  but 
clearly  such  changes  may  be  necessary  even  if  Report  Writer  were  used 
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Our  research  suggests  that  the  following  error  categories  appear  to  be 
problem  areas  in  COBOL.  However,  it  does  not  suggest  that  these  categories  are 
error  prone.  Recall  that  error-proneness  is  a  function  of  the  total  number  of 
usages  of  a  particular  language  element.  Since  we  did  not  attempt  to  determine 
the  total  number  of  usages  for  each  of  the  language  features  in  question,  we 
cannot  make  any  conclusions  pertaining  to  error-proneness.  The  frequency  of 
program  changes  for  some  of  these  categories  r  emains  relatively  stable  over 
lime  and  therefore  these  categories  appear  to  be  problem  areas.  Other 
categories  show  potential  for  being  problem  areas  due  to  the  relatively  high 
frequency  of  changes  observed. 


[i]  Data-name  qualification. 

The  program  changes  that  wc  categorized  as  "data-name  qualification" 
involved  qualifying  non-unique  data  names.  There  were  considerably  more 
instances  where  qualification  was  omitted  entirely  than  there  were 
instances  where  it  was  inadequately  specified.  For  example,  in  the  context 


01  A. 

Ob  13. 

10  C  PIC  99 

01  1) 

Ob  li. 

10  C  PIC  XXX. 


frequently  the  data  name  C  was  not  qualified  when  referenced  in  the 
procedure  division.  Proper  qualification  of  C  in  this  context  is  C  OF  D  or  (C 
01  ’  H  OF  D)  or  C  OF  A  or  (C  OF  B  OF  A)  The  function  of  non-unique  data 
names  in  COBOL  is  twofold;  they  provide  increased  flexibility  and  are 
necessary  for  the  proper  use  of  the  CORRESPONDING  option.  Despite 
increased  flexibility,  non-unique  data  names  require  qualification,  and 
qualification  actually  makes  programming  in  COBOL  more  cumbersome. 
For  example,  consider  the  arithmetic  statement 


MULTIPLY  QTY-0N-HAND  OF  JNPUT-QTY 
BY  UNIT-PRICE  OF  PARTS-RECQRD 
GIVING  TOTAL-COST  OF  OUTPUT-RECORD. 


I ). Un-name  qualification  appears  only  to  complicate  programming  and  make 
sin  h  arithmetic  expressions  less  readable.  Since  COBOL  is  an  inherently 
vei  hose  language,  it  would  most  likely  not  sufTcr  if  all  data  names  were 
required  to  be  unique. 
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[2]  CORRESPONDING  option. 

This  feature  may  be  used  to  reference  all  fields  with  common  dal  i  n  am 
within  two  different  groups  [ShelVVj  Most  program  changes  made  < i . n -  in 
the  CORRESPONDING  option  were  attempts  to  reference  the  field:  ii.h  nded 
within  two  difTercnt  groups  For  example,  m  the  context 


01  A. 

05  C. 

lb  D  PICX. 

01  B. 

05  C. 

10  E. 

15  D  PIC  X 


we  observed  many  programmers  using  a  statement  such  as 
MOVE  CORRESPONDING  A  TO  13 

to  move  the  contents  of  D  in  group  A  to  D  in  group  B  However,  the  intended 
move  will  not  occur  in  this  context  since  D  in  group  B  is  at  a  difTercnt  level 
than  D  in  group  A  The  CORRESPONDING  option  has  pitfalls  that  have  caused 
experienced  programmers  to  minimize  its  use.  The  main  problem  is  that  it 
tends  to  create  trouble  when  a  program  is  changed,  as  virtually  all 
programs  are  if  they  are  used  for  any  length  of  time  One  portion  of  a 
program  that  generally  changes  is  the  formal  ol  records.  Exponent  e  has 
shown  that  record  formal  changes  very  Irequi  rilly  cause  the 
CORRESPONDING  verb  to  give  undesired  results  j  MeCrrG]  The  effort  ol 
using  unique  data  names  and  explicitly  referencing  elementary  data  denis 
has  the  advantage  of  providing  easier  maintenance  of  the  program  .mil 
reduced  chance  of  error.  Thus,  it  appears  that  COBOL  wor1^  not  sulfcr 
without  the  CORRESPONDING  option 


[3]  Edited  numeric  data  items  as  operands  in  arithmetic  expressions 

The  restriction  that  edited  numeric  data  cannot  be  used  in  arithmetic 
statements  often  causes  a  programmer  to  declare  another  dula  name  to  be 
computational  in  nature.  For  example,  in  the  context 

05  LINE-KOUNTEK  PIC  //‘J 

the  statement  ADD  1  TO  LINE-KOUNTEK  is  invalid  since  LINE-KOUNTEK  i  : 

alphanumeric.  COBOL  compilers  could  be  written  to  generate  code  that  < 

would  coerce  edited  numeric  data  in  much  the  same  way  an  integer  >  .ire 

coerced  in  real  expressions  in  FORTRAN.  However.  Uio  introduction  ol 

coercions  Into  a  programming  language*  should  In-  dime  with  connaln  .d>h 

discretion  (TennOlj.  For  example,  consider  the  declaration 


05  FIELD  PIC  X 


-  10  - 


The  bit  configuration  of  FIELD,  which  occupies  one  byte,  can  represent  a 
or  some  other  character  sl  eh  as  a  letter.  Since  edited  numeric  data 
iU  in s  arc  a  subset  of  the  set  of  all  alphanumeric  data  items,  it  would  be 
pi)  jsiblc  to  extend  coercion  to  the  set  of  all  alphanumeric  data  items.  Such 
.m  extension  would  allow  FIELD  to  occur  as  an  operand  in  an  arithmetic 
expression  However  since  FIELD  can  represent  a  letter,  coercion  would 
allow  computation  of  the  arithmetic  expression  to  continue  and  possibly 
produce  bizarre  results  Programmers  normally  do  not  welcome  error 
messages  but  a  message  that  helps  in  locating  a  bug  is  far  more  useful  than 
meaningless  output. 


4]  Literal  continuation 

The  program  changes  related  to  literal  continuation  involved  correcting  a 
misplaced  single  quote  or  providing  a  hyphen  in  column  seven.  The 
frequency  of  changes  made  due  to  invalid  literal  continuation  decreases 
r  apidly  after  (he  second  programming  assignment,  C0U0L2  (see  Appendix 
l)  This  rapid  decline  is  due  Lo  the  abandonment  of  the  technique  used  to 
continue  literals  Programmers  avoided  this  technique  by  adopting  other 
means  for  declaring  lengthy  literals.  For  example,  some  programmers 
placed  the  entire  literal  on  a  new  line  whereas  others  partitioned  the  literal 
into  smaller  segments. 


5]  1F-KLSE  pairing  convention. 


I  Vi  litipM  tl»o  t  it  ci- ;  L  fniiiou-i  example  of  ambiguity  in  a  programming  language 
is  l lie  dangling  ELSE.  Consider  the  conditional 


si 

IFc2 

s2 


ELSE 


s3 


ax  ample  1 


It  is  not  clear  to  which  IF  Liter  ELSE  corresponds.  Without  changing  the 
lor  mal  definition  or  the  syntax  ol  COBOL.  the  ambiguity  can  be  resolved  in 
one  of  two  ways  Thu  first  approach  involves  introducing  the  keywords 
EEC  IN  and  END  (sec  example  3).  The  second  approach,  which  is  used  in 
COBOL,  is  to  adopt  a  convention.  The  one  used  in  COBOL  is  that  in  a  nested 
IF  statement,  the  first  ELSE  clause  corresponds  to  the  innermost  IF 
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statement  [Shel77].  Consider  example  1.  Jf  is  to  be  executed  when  r.  is 
false  then  "ELSE  NEXT  SENTENCE"  must  be  inserted  before  the  existing 
ELSE  since  the  ELSE  which  currently  exists  corresponds  to  the  innermost  IE 
(see  example  6). 


[0]  Dependency  upon  the  period  to  terminate  a  conditional 

Since  COBOL  is  very  sentence-oriented,  the  placement  of  a  period  al't  r  a 
statement  is  natural.  However,  the  use  of  periods  aflor  statement s  wjlhm 
conditionals  will  yield  undesirable  results.  As  a  result  of  the  ilepi ral'inv 
upon  the  period  in  an  IE  statement,  programmers  oil  on  spend  rmn  li  l  mu 
debugging  programs  only  to  discover  the  exist  c.  nee  cl  an  extrah'  period 
in  an  IK  statement.  The  following  conditional  illustrates  the  problem  of 
period  dependency. 


READ  file-name 
AT  END  s3 


s2 

example  2 


The  programmer  is  forced  to  put  a  period  at  the  end  of  the  imperative 
clause  s3  so  that  and  Sg  are  not  executed  upon  an  end-of-file  condition 
only.  However,  placing  a  period  after  s3  causes  Sj  and  Sr>  to  be  executed 
independently  of  c,  because  the  period  terminates  both^the  AT  END  clause 
and  the  conditional.  Clearly,  an  "ENDIF"  or  perhaps  an  "ENDAT”  construct 
would  eliminate  the  dependency  upon  the  period  However,  until  such  a 
construct  is  added  to  the  language,  COBOL  instructors  should  emphasize 
such  potential  pitfalls 


There  have  been  attempts  to  simplify  COBOL  programming  by  making 
COBOL  extensible;  i.e.(  allowing  the  syntax  and  semantics  of  COBOL  to  be 
changed.  One  of  the  earliest  and  most  commonly  proposed  schemes  for 
language  extension  is  the  macro  definition  [Grie?l,Aho  72].  Already  in  use.  arc 
two  macro  preprocessors  MetaCOBOL  and  COBRA  which  enhance  COBOL 
[ADR70,Hami73].  The  processor  need  not  precede  compilation.  Trial  ice  cl  al. 
built  a  macro  facility  into  a  COBOL  compiler  [TriaOOj  This  compiler  is  believed 
to  be  the  first  compiler  with  a  builtin  macro  facility  capable  of  recognizing 
macro  calls  with  arguments.  An  example  of  a  macro  call  specified  in  a  COBOL 
program  is  "CSK".  This  call  initiates  execution  of  a  macro  which  simply  replaces 
the  call  by  "COMPUTATIONAL  SYNCHRONIZED  RIGHT"  The  COBOL  macro  facility 
could  conceivably  be  extended  to  provide  support  in  the  area  of  nested 
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conditionals  For  example,  a  programmer  could  override  the  convention 
adopted  for  IF- ELSE  pairing  by  using  the  keywords  UEGIN  and  END.  For 
example,  assuming  S3  is  to  be  executed  if  is  false,  example  1  could  be 
rewritten  as 


IF  c  ] 

BEGIN  s1 

IF  c2 

BEGIN  s2  END 
END 

ELSE  BEGIN  S3  END. 
example  3 


Suppose  there  were  an  "ENDIF  construct,  then  assuming  and  Sg  are  to  be 
executed  if  C|  is  true  and  independently  of  the  imperative  clause,  example  2 
could  be  rewritten  as 


READ  file-name 
AT  END  S3. 


S1‘ 

s2* 

ENDIF 


example  4 


To  utilize  a  macro  facility,  a  programmer  could  specify  a  macro  ceill  such  as 
"CONDITIONAL".  For  example 


CONDITIONAL 


BEGIN 
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IFc2 

BEGIN  s2  END 
END 

ELSE  BEGIN  s3  END 
ENDCOND1TIONAL 


example  h 


The  entire  conditional  specified  between  "CONDITIONAL"  and 
"ENDCOND1TIONAL"  wouLd  be  treated  as  a  by-valuc  parameter  subject  to 
interpretation  by  a  particular  macro  This  macro  would  generate  standard  ANSI 
COBOL  code  for  the  conditional  specified.  For  example,  the  code  generated  lur* 
example  6  would  be 


S1 

IFc2 

s2 

ELSE 

NEXT  SENTENCE 
ELSE  s3. 


example  6 


6.  Summary 

Since  COBOL  is  a  widely  used  language,  there  is  a  need  to  identify  its 
problem  areas  so  that  improvements  can  be  made  in  COBOL  compiler  s  and  in 
the  manner  in  which  COBOL  is  taught.  Such  improvements  could  yield  a 
reduction  in  the  number  of  errors  committed  by  COBOL  programmers. 

Attempts  have  been  made  to  identify  error-inducing  features  in  COBOL 
[Litc76,Youn74].  However,  the  error  frequencies  for  certain  COBOL  features  have 
not  been  observed  with  repect  to  time.  Our  research  attempted  to  identify 
error-inducing  features  (problem  areas)  by  observing  the  frequency  of  errors 
for  various  features  over  time.  Thus  the  features  we  have  identified  as  problem 
areas  are  likely  to  be  error-inducing  for  experienced  as  well  as  novice  COBOL 
programmers.  Our  study  suggests  there  are  at  least  six  problem  areas  in  COBOL: 
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[tj  Data-name  qualification 

[2]  CORRESPONDING  option 

[3]  Edited  numeric  data  items  in  arithmetic  expressions 

[4]  Literal  continuation 

[5]  IF- ELSE  pairing  convention 

[6J  Dependency  upon  the  period  to  terminate  a  conditional 


Furthermore,  wc  have  suggested  approaches  that  may  tend  to  eliminate 
some*  of  those  problem  areas  For  example,  we  feel  that  non-unique  data  names 
and  the  CORRESPONDING  option  could  be  eliminated.  Edited  numeric  data  items 
occuring  in  arithmetic  expressions  could  be  coerced.  A  macro  facility  could  be 
used  to  alter  the  syntax  of  conditionals  in  COBOL  so  that  errors  related  to 
conditionals  can  be  reduced. 

Undoubtedly  additional  problem  areas  exist  in  COBOL.  For  example,  we 
could  not  observe  the  error  frequency  for  features  such  as  the  COBOL  sort 
facility  and  random  access  since  these  features  were  not  used  more  than  once 
bv  our  subjects.  Hence,  there  is  a  need  for  further  research  to  observe  the 
error  frequency  over  time  for  more  advanced  features.  Upon  identifying  those 
err  or-inducing  features,  additional  improvements  can  be  made  with  respect  to 
COBOL  compilers  and  the  teaching  of  COBOL. 
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COBOL 2 

Error  Categories  ^ 


1 .  Structural  Keywords 


A. 

Mi sspel 1 ing 

1 .  ENVIRONMENT 

1 

2  DATA  DIVISION 

2 

B. 

Missing  Keywords 

1 .  Data  Division 

0 

a.  FILE  SECT'! CBM 

o  WORK. ING -STORAGE  SECTION 

i 

c.  PICTURE 

0 

2  Procedure  Division 

0 

a.  STOP  RUN 

b.  OPEN 
c  CLOSE 

d  1NFCT7  OUTPUT  of  OPEN 

0 

1 

0 

2  Sentence  Structure 

A 

Invalid 

0 

u. 

Data^ name  qualification. 

0 

1  .  Chii  t  ted 

2  Insufficient 

0 

C. 

Invalid  assignment 

0 

u. 

Misspol 1 ing 

1 .  ASCENDING 

2  COKi^ESPOND  1 NG 

3 .  Editing 

A. 

Zero  suppression 

1  Truncation  of  higher 

order  digits 

‘V 

JJ. 

Edi l i ng  s>mbo 1 s 

1 .  .  /V 

9 

2.  -/S 

1> 

3  S  used  for  zero 

suppress  ion 

i 

4  B  used  instead  of  SPACES 
t>  Editing  symbol  in  P I CTUKE 

.1 

not  intended  to  edit 

) 

C. 

Use  of  edited  ittrn  in 

Program 

C0B0L3  C0B0L4 

f  f 


0  2 
1  0 


2  1 

0  0 

3  2 


i  0 

I  0 

3  2 

0  3 


C  3 

5  37 

0  3 

2  0 

1 

2 


0  1 

4 

0  0 

0  0 

0  0 


00B0L5 

f 


0 

0 


0 

0 

0 

0 

0 

1 

0 


0 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 


) 


3 


0 


0 
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d 

.4 

n 

d 

dddcd  mmm  mm  v  v 

d  m  m  tn  v  v 

d  m  m  n  •,  v 

d  m  m  .n  v  v 

d  ni  ni  fr.  v  v 
ddddd  n  m  t<  v 
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arithmetic 


7  2  5 


4.  CORRESFOIJDING  verb 
A.  Inproper  use 


2B 


5.  Format 

A.  Margins 

1  .  Lef t  of  co l am  8 

2.  Right  of  cotuin  8 

3.  Left  of  col  am  12 

4.  Right  of  col  am  72 


0  1  4 
0  9  12 
4  5  7 
3  '1  1 


6.  Reserved  words  used  as 
tdent iflers 

A.  PAGE 

B.  PAGE -COUNTER 

C.  LINE-COUNTER 


1  0 

4  0 

0  1 


7.  Data  description 
A  Format 

1.  Missing  keyword  ALL 
in  VALUE  clause 

B.  Class 

1 .  Alphanuner  ic  /Nuner  ic 

C .  Spac  i  ng 

1.  Space  between  type  and 
length  (e.g  PIC  X  (18)) 
D  Level  nurber  missing 


0 

0 


0 

1 

5 

0 


8.  Punctuation 

A.  Period  added 

1.  Within  CLOSE  statement 

2.  After  FD  file- name 

3.  After  VALUE  keyword 

4.  Within  OPEN  statement 

5.  Before  end  of  file 
descript i on 

B.  Period  missing  after 

1 .  SOUFCE- COMPUTER 

2.  OBJECT -COMPUTER 

3.  Croup  level  item 

4.  PICTURE  clause 

5.  VALUE  clause 
8.  Program  nerre 

7.  FILE  SECTION 

8.  Paragraph  nan® 


1  0 
2  0  0 


1  0 
1  0 


0  3 


1  0 

1  0 

IB  7 

•1  3 

4  2 

0  1 

o  i 

0  0 


3 
o 

i 

i 

0 

4 


9.  Hyphenation 


ooo  O  CO  G  » — i  oo^ococo 


p 


- 


A.  Missing  in 

I  SI  *KC  I AL- NAMES  1 

.3.  SOI  ■UX'K-COMI-l/l’EK  1 

1.  OL<JHCr-C’OMHJTER  1 

4.  1  HUH- VALUES  0 

i>  EILE-LIMPl 

R.  Added  in 

;  WORK  I  NU  -  STORAGE  -  SECT  1  ON  0 


10.  Literals 

A  Literal  continuation 

i  Misplaced  hyphen  or 

single  quote  11 

13  A  l  pharum*  t  i  c/Nirner  i  c 

(e  g  PIC  99  VALUE  ’SO1)  5 

C.  Alphanumeric  literal 

Missing  quotes  8 

2.  Length  exceeds  size 

of  PICTURE  0 

3.  Invalid  delimiter  0 

11.  Invalid  use  of  figurative 
constant  • 

A  SPACES  20 

1 2  Cond  it  i  orn  l  s 

1  l  l»VU  ini  «  M  1 1 1 »«  n  Mil  l 

1  \NI  > /( M '  nit  ;i  i  rig  1 

2  'M»t  pur*  Tithes  I  zed 

,  '  r  p<  r  l  v  12 

:i  !nvri';cf  u s c  of  AND/OR  0 


R  inv-i.-d  if  I  a*,  octal  operator 
NOT-/ IS  UNCQUAJ /<> 

2.  Space  not  preceding/ 

I  o  1  l  o',  i  ng  re  1  at  i  ona  \ 


»  j>l  rai  or  2 

ie  A  <  ’ll  ian  n  o 

C.  jViPinng  iK-nni  0 

e  rumETci  i;i : *  n 

K  iVr.-.-iJ  oj  icu'il  too  early 

i  ■  l  ctl  cond  1 1  i  ona  I  8 

\;d  nested  0 

f  Abbr  \  I  at  I  Oils 


Subject  and  relation 

'nil' ted  in  conpcund 

conditional  which  involves 

a  C  I  a  >•  s  test  (e  g 

IP  A  “  J*  AND  NUMERIC)  0 


13.  Write  statement 


4 


A 

Write  WORKING-STORAGE  record 

10 

3 

0 

0 

L< 

Writ'.-  statements  with  and 
without  ADVANCING  option 

3 

0 

4 

0 

Read  stattment 

A  AT  END  c  l  aus  e  cmi  1 1  ed 

1 

0 

1 

0 

U  Conditional  within  inperative 
clause 

3 

4 

0 

0 

C.  READ  is  not  last  statement 
wi thin  condi t lonal 

0 

43 

0 

0 

D.  READ  file-name- 1  INTO 
file-name-2 

0 

4 

0 

0 

E.  READ  file-name  TO  record-name 

0 

4 

0 

0 

Level  80  items 

0 

0 

A  PICTURE  clause  at  level  88 

0 

2 

U  Quantity  MOV Ed  to  a  level  8B 

i  Utn 

V 

3 

l) 

n 

i  ■* 

U  Level  88  i  Ltin  MUVKd 

D.  Data  name  with  PICTURE  clause 

u 

j 

i  / 

0 

used  as  switch 

0 

0 

* 

Redefinition 

A  At  a  level  other  than  01 

did 

not  have  the  seme  nudber 
bytes  as  the  i ton  being 
redefined 

of 

0 

2 

1 

0 

17.  Tables 

A.  .Subscripting 

1.  No  space  separating 
data  name  and  left 
parentheses  of  subscript 

2.  Subscript  missing 

3.  Subscripted  data  name  used 
as  subscript 

4.  Data  name  without  OCCURS 
clause  is  subscripted 

B  OCCURS  clause 

1.  At  a  level  01 

2.  PIC  X(40)  OCCURS  40  TIMES/ 
PiC  X  OCCURS  40  TIMES 

C.  Indexing 

1 .  Use  of  an  index  other  than 
the  index  defined  for  that 

t  .lb  I  c 

I).  SEARCH  verb 

1.  SEARCH  the  incorrect 
data  name 

C.  Level  structure 


11  5  0 

0  5  4 

1  0  0 

0  1  0 

2  1  C 

3  1  0 


0  6  C 


I  0 


1 


ln|M*upf*r  I  'vel  rurrtn  r 


18  SORT  verb 

\  ru/sn 

s  i£K  \i)  'RF-niJRN 

wivTi  e.-rei  ease 

*>  i NR/l/Ol TFIJT  procedure  is 

n*»l  o  so  -  •.  i  on 

*•  ii;id’;>KV  f>,u*.^r,iph-naiie  SECTION 

:•  INWl  VOLTECi1  1TXX.TJDUKE  IS 

p  i'M.i  raph-  rur  ?  •  SECT  I  ON 
1 1  Invalid  -mi  l  key 
’*  SEI  Ei  T  clause  for  oorl  file 
nu  ssi  ii’ 

9  I>!iiru*rm  , u*.  .*  -•  • 

A  i inv 1  :*■  ■  r,p . ni  division 

Invalid  SELECT  clause 


G.  3  100 -j 


%  OF  TOTAL  COMPILED 


%  OF  TOTAL  COMPILED 


COSMET  1C 
CHANGED 


I 

■°  i  /. 

REPORT  -  '  ,  v 

RE  L AT  EL  1  /  \ 

CHAN -EC  An  J  '  \ 


4 


10  30  50  70  90  IOC 


%  OF  TOTAL  COMPILED 


